Some people, when confronted with a problem, think I know, Ill use regular expressions. Now they have two problems. --Jamie Zawinski, in comp.lang.emacs
Used to extremes in Perl. Available in many languages. The following is designed as a quick reference / memory jog for experienced RE users. Any new users should... A) find another solution B) copy existing working code C) join a newsgroup or mailing list and ask for help D) take a class. RE is like shaking hands with an octopus.
Matches
^ beginning
$ end
. any character
[.-.] any character from the first "." to the second where . is any character
e.g. [A-Z] matches any uppercase letter
Literals
\. Quote. Treats "." as a literal value where . is any character
e.g. \$ matches the dollar sign, not the end of line.
\### Byte where ### are three octal digits.
\x## Byte where ## are two hexadecimal digits.
Flow control
(.*) Group. Matches everything in the parens or nothing. Saves the match in $# were #
counts up the groups.
e.g. Time: (..):(..):(..) will put the hours in $1, minutes in $2 and seconds in $3.
.*|.* Or. If the pattern before the "|" fails to match, it will try the pattern after.
e.g. A|B will match A or B
Repeat
* 0 or more times. Same as {0,}. Will "eat" to the end unless followed by ? or something else
+ 1 or more times. Same as {1,}. Will "eat" to the end unless followed by ? or something else
? 0 or 1 times. Same as {0,1}
{n} Match exactly n times
{n,} Match at least n times. Will "eat" to the end unless followed ? or something else
{n,m} Match at least n but not more than m times.
.*? Match the minimum number of times possible where .* is one of the repeat patterns above.
e.g. foo(.*)bar used against "the food is barbecued in the barn" will set $1 to "d is barbecued in the "
but foo(.*?)bar will set it to "d is ". Notice
that foo(.*)barb will also produce "d is "
For a regular expression to match, the entire regular expression must match, not just part of it. So if the beginning of a pattern containing a quantifier succeeds in a way that causes later parts in the pattern to fail, the matching engine backs up and recalculates the beginning part--that's why it's called backtracking.
Also:
See also:
| City state zip | \s*(.*)\s*,\s*([A-Z]{{2}})\s+(\d{{5}}(\-\d{{4}})?)\s*" |
| HTML eMail with only an image in it |
The following expression will match a message that contains one or more
images and no text at
all: <BODY[^>]*>(<[^>]+>|\n|\r)*<IMG[^>]+>(<[^>]+>|\n|\r)*</BODY> |
| HTML eMail with an image |
<BODY[^>]*>(<[^>]+>|\n|\r|\s)*<IMG[^>]*src=['"]?cid: |
Interested: